EC7412 Part II: Data Science for Economists
April 16, 2025
Introduction
Basic R syntax
When things don’t work
Logic
Subsetting
Scoping
R and VS Code installedR extensionIntroduction
Basic R syntax
When things don’t work
Logic
Subsetting
Scoping
We can use R for arithmetic by executing operations in an R terminal:
Each row calls arithmetic functions on scalar objects (numbers).
Options:
Ctrl/cmd+enter to send row to terminal in VS Code.R) with Ctrl/cmd+shift+SWe create new objects by assigning values to names:
Creates two scalars and assign them the names "a" and "b".
typeof() tells us how objects are stored in memoryclass() is more informative:household_income_after_tax is much better than hhinc2.NA (not available)Inf, -Inf and NaN (not a number)Data frames are collections (lists really) of vectors of the same length, organized in a table:
Processing data in R, you will run most of your operations on data.frames (or similar objects like tibbles or data.tables).
Pipes (|>) make your code more readable:
Lines starting with # are not executed. Use comments to explain what you do, and to get code-completion suggestions from Copilot.
Introduction
Basic R syntax
When things don’t work
Logic
Subsetting
Scoping
?named_object?package.name.
Error in data[1]: object of type 'closure' is not subsettable
!?!?
Print the object in the terminal:
View() is useful for larger objects, try View(mtcars)
str() gives you the structure of the object
pillar::glimpse() is a nice alternative to str()
For functions, printing can1 give you the code. Try printing data.
browser() to enter the debugger at a certain placeR debugger extension that makes this easierIntroduction
Basic R syntax
When things don’t work
Logic
Subsetting
Scoping
We often want to run operations conditional or certain criteria. For this we need logical operators.
Note the difference between assignment (=) and comparison (==).
Let’s say we want to check if a variable is larger than two other variables:
Logical operators (>, ==, etc) are evaluated before Boolean (& and |).
Boolean operators require logical arguments. R runs as.logical(4) before the comparison. All non-zero numbers are coerced to TRUE, only 0 is FALSE.
But! When the vectors are of different lenght, R “recycles” the shorter vector.
Warning in c(1, 2, 3) > c(1, 3): longer object length is not a multiple of
shorter object length
[1] FALSE FALSE TRUE
When one object is a multiple of the other, there is not even a warning.
Be careful!
The same is true for Boolean operators:
Warning in c(TRUE, FALSE) & c(TRUE, FALSE, FALSE): longer object length is not
a multiple of shorter object length
[1] TRUE FALSE FALSE
To require scalars for our comparison, we can use && and || instead:
In R, NA are properly treated as “missing”, the value could be anything.
But:
Here, it does not matter what NA could be, since both TRUE | TRUE and FALSE | TRUE evalute to TRUE.
What if we want to check what values of a vector are missing:
! negates a logical statement:
For example we might want to filter for non-missing by running !is.na():
To check if a scalar is an element of a vector:
Works just as well on character vectors
Arithmetic operations on floating points are not exact:
Can be used directly in assignment (not recommended)
If statements should normally be written on multiple rows with brackets:
if can only evaluate logical scalars, for vectors, use ifelse()
Error in if (c(3, 1) == c(1, 4)) "foo" else "bar": the condition has length > 1
We can nest multiple ifelse():
Or use data.table::fcase() (or dplyr::case_when())
Introduction
Basic R syntax
When things don’t work
Logic
Subsetting
Scoping
[]a[n] selects the n:th element of the vector a
We can supply integers:
Or logicals:
[[]] or $$named_elem
[1] 1 2 3
[[2]]
[1] "a" "b" "c"
Let’s pick the first element of the list:
Do you see the difference? [] returns a list of length 1 while [[]] returns the vector stored as the first element of a_list (try typeof() to see).
[[]] or $ (cont.)a$x is a shorthand for a[["x"]]
Introduction
Basic R syntax
When things don’t work
Logic
Subsetting
Scoping
Variables defined in functions are not “global”
But what if the object does not exist in the function scope?
R looks “one level up”
Let’s say we want to run a regression on the data in the df data.frame we created
R cannot find the y variable. We need to look “inside” the df object.
We can supply each vector separately
…or we can use with() to call lm() “inside” df:
…or we can use the data argument of lm()
install.packages() and loaded with library()library() loads package functions into global environment::Lets try to look at the source code for the mean() function
mean() calls different methods depending on object class
[1] mean.Date mean.default mean.difftime mean.IDate*
[5] mean.ITime* mean.POSIXct mean.POSIXlt mean.quosure*
[9] mean.vctrs_vctr*
see '?methods' for accessing help and source code